Hybrid Stemmer for Gujarati

نویسندگان

Pratikkumar Patel

Kashyap Popat

Pushpak Bhattacharyya

چکیده

In this paper we present a lightweight stemmer for Gujarati using a hybrid approach. Instead of using a completely unsupervised approach, we have harnessed linguistic knowledge in the form of a hand-crafted Gujarati suffix list in order to improve the quality of the stems and suffixes learnt during the training phase. We used the EMILLE corpus for training and evaluating the stemmer’s performance. The use of hand-crafted suffixes boosted the accuracy of our stemmer by about 17% and helped us achieve an accuracy of 67.86 %.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Lightweight Stemmer for Gujarati

Gujarati is a resource poor language with almost no language processing tools being available. In this paper we have shown an implementation of a rule based stemmer of Gujarati. We have shown the creation of rules for stemming and the richness in morphology that Gujarati possesses. We have also evaluated our results by verifying it with a human expert.

متن کامل

A Study and Comparative Analysis of Different Stemmer and Character Recognition Algorithms for Indian Gujarati Script

A lot of work has been reported on optical character recognition for various non-Indian scripts like Chinese, English and Japanese and Indian scripts like Tamil, Hindi Telugu, etc. , in this paper, we present a literature review on stemmer, optical character recognition (OCR) and Text mining work on Indian scripts, mainly on the Gujarati languages. We have discussed the different techniques for...

متن کامل

Improving a Lightweight Stemmer for Gujarati Language

The origin of route of text mining is the process of stemming. It is usually used in several types of applications such as Natural Language Processing (NLP), Information Retrieval (IR) and Text Mining (TM) including Text Categorization (TC), Text Summarization (TS). Establish a stemmer effective for the language of Gujarati has been always a search domain hot since the Gujarati has a very diffe...

متن کامل

Improving the quality of Gujarati-Hindi Machine Translation through part-of-speech tagging and stemmer-assisted transliteration

Machine Translation for Indian languages is an emerging research area. Transliteration is one such module that we design while designing a translation system. Transliteration means mapping of source language text into the target language. Simple mapping decreases the efficiency of overall translation system. We propose the use of stemming and part-of-speech tagging for transliteration. The effe...

متن کامل

Morphological Analyzer for Gujarati using Paradigm based approach with Knowledge based and Statistical Methods

Morphological Analyzer is a tool which performs syntactic analysis of a word and finds root form of input inflected word form. Morph analyzer serves as a pre-processing tool for many NLP applications. Significant amount of work has been done in this area for many Indian languages but not much work has been reported for Gujarati language. We present Morph analyzer for Gujarati language. The Morp...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Hybrid Stemmer for Gujarati

نویسندگان

چکیده

منابع مشابه

A Lightweight Stemmer for Gujarati

A Study and Comparative Analysis of Different Stemmer and Character Recognition Algorithms for Indian Gujarati Script

Improving a Lightweight Stemmer for Gujarati Language

Improving the quality of Gujarati-Hindi Machine Translation through part-of-speech tagging and stemmer-assisted transliteration

Morphological Analyzer for Gujarati using Paradigm based approach with Knowledge based and Statistical Methods

عنوان ژورنال:

اشتراک گذاری